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This  paper  describes  a  line  switching  communication  network  where, 
as  the  network  becomes  congested,  additional  lines  are  opened  to  relieve 
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or  closing  operation;  a  cost  is  paid  per  unit  time  for  each  line  in 
use;  a  cost  is  paid  per  unit  time  for  packet  storage  in  a  packet  queue; 
and  reward  is  earned  for  each  packet  transmission  completed. 

Under  assumptions  concerning  Poisson  arrival  and  service,  a  Markov 
model  can  be  formulated  which  determines  the  average  earning  rate  for 
the  network  under  a  specific  policy.  The  policy  is  the  control 
algorithm  which  specifies  when  lines  should  be  opened  or  closed.  The 
solution  technique  involves  replacing  an  infinite  subset  of  the  state 
space  by  a  finite  set  of  states  with  equivalent  behavior.  A  traditional 
technique  called  policy  iteration  can  then  be  applied  to  the  reduced 
finite  model.  The  algorithm  solves  for  the  optimal  policy,  i.e.  the 
policy  yielding  the  highest  earning  rate. 

The  paper  illustrates  how  optimal  policies  for  the  finite 
optimization  problem  are  also  exactly  correct  for  the  original  infinite 
problem.  While  the  work  describes  how  a  line  switching  network  can  be 
modelled  and  optimal  control  strategies  determined;  it  also  illustrates 
a  modelling  technique  which  can  be  applied  to  other  optimization 
problems  having  an  infinite  state  space. 
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LINK  CAPACITY  CONTROL 


IN  A 

COMPUTER  COMMUNICATION  NETWORK 

i.  In*- reduction 

Much  of  the  work  in  the  computer  communication  network  area  is 
concerned  with  networks  operating  with  fixed  capacity  links  between 
nodes.  Typically  the  emphasis  is  on  determining  a  fixed  link  capacity 
for  each  link  within  the  network.  Much  of  the  results  in  the  area  are 
based  on  work  by  Kleinrock  [1].  There  are  several  capacity  assignment 
algorithms  and  they  are  given  in  [2].  However,  for  networks  with  widely 
varying  traffic  loads,  it  may  be  beneficial  to  design  a  system  where  the 
link  capacity  is  variable.  This  might  be  realistic  for  a  network  using 
multiple  dial-up  telephone  lines  where  the  link  capacity  can  be 
increased  by  using  another  telephone  line.  In  this  paper,  we  will  study 
the  problem  of  dynamically  controlling  the  link  capacity  between  two 
-.odes  in  a  computer  communication  network.  For  a  general  network  it  is 
simple  to  apply  the  algorithm  link  by  link  throughout  the  network.  The 
problem  is  treated  using  a  Markov  modeling  technique  where  decision 
theory  produces  a  optimal  set  of  controls  or  policies  yielding  maximum 
average  reward  per  unit  time. 

In  section  2  we  will  define  a  Markov  model  describing  the  state  of 
the  link  between  the  two  nodes.  This  will  be  an  infinite  state  space 
model  which  is  difficult  to  solve  using  traditional  analytic  techniques. 
For  that  reason,  in  section  3  we  will  show  a  method  of  reducing  the 
state  space  cardinality  to  that  of  a  finite  model.  To  do  this  we  will 
note  that  at  some  point  in  time  the  packet  queue  is  so  heavily  loaded 
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that  any  optimal  policy  will  open  all  available  lines.  In  section  4  we 
analyze  this  policy  so  that  in  section  5  we  can  show  that  this  policy  of 
opening  all  lines  is  indeed  optimal  when  the  packet  queue  is  heavily 
loaded.  Once  the  state  space  has  been  reduced,  section  6  gives  a  brief 
description  of  the  algorithm  for  determining  the  optimal  policy  for  line 
management.  Section  7  gives  an  example  of  the  use  of  the  algorithm  and 
section  8  contains  a  summary  and  conclusions. 

2.  Model  Description 

Two  nodes  in  a  communication  network  communicate  through  an 
integral  number  of  unidirectional  lines  each  having  fixed  capacity, 
costing  a  fixed  charge  per  unit  time,  and  a  fixed  amount  for  each  line 
opening  or  closing  operation.  In  the  two  node  network  shown  in  figure 
1 ,  note  that  a  queue  of  packets  await  transmission  across  the  link  from 
node  N1  to  node  N2.  The  link  between  nodes  represents  an  integral 
number  of  lines  of  equal  bandwidth  where,  the  number  of  lines  is 
determined  as  a  function  of  queue  length  at  node  Ml.  Thus  information 
available  at  the  source  node  is  used  to  control  the  opening  and  closing 
of  L  unidirectional  identical  lines  connecting  node  N1  to  node  N2. 

The  inter-departure  times  of  packets  served  at  each  line  are 
assumed  exponentially  distributed  with  parameter  u  .  The  inter-arrival 
times  of  packets  are  also  exponentially  distributed  but  with  parameter 
^  .  Consider  a  collection  of  i  open  lines  each  serving  packets  at  rate 
p.  .  The  parallel  collection  of  i  open  lines  can  be  replaced  by  an 
equivalent  single  line  with  exponential  inter-departure  rate  i  •  ^ .  This 
holds  while  there  are  at  least  as  many  packets  as  lines.  When  the 
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number  of  packets  in  the  queue ,  j  is  less  than  the  number  of  lines ,  then 
only  j  packets  reside  at  a  line  server  and  the  effective  rate  is  j  •  u- 
Thus,  if  i  represents  the  number  of  open  lines  and  j  represents  the 
packet  queue  length,  the  effective  service  rate  S.  .  is  specified  by: 
j=min{i, j}  •  p.. 

2.1  State  and  Policy 

The  line  control  problem  described  above  can  be  modeled  by  a  Markov 
process.  Each  state  within  the  Markov  process  specifies  the  current 
number  of  lines  in  use,  and  the  current  queue  length.  Hold  times  and 
transition  probabilities  are  determined;  and  a  reward  structure  is 
defined;  so  that  the  Markov  decision  theory  treated  by  Howard  in  [3] 
may  be  used  to  maximize  the  reward  per  unit  time  or  gain  of  the  process. 

The  SSL  afeatM,  (q,  4i1<si<*L,  0<=j}  is  defined  so  that  at  each 

state  q^j  exactly  i  lines  are  open  and  j  packets  reside  in  the  queue. 

Three  possible  policies  exist  at  each  state,  a  line  may  be  opened,  a 

line  may  be  closed ,  or  the  line  count  may  remain  constant  and  the  packet 

arrival/ service  process  will  operate.  It  is  assumed  that  a  line  opening 

or  closing  operation  will  occur  instantaneously.  The  policy  function, 

P°L.  1  c  represents  the  selected  line  control  policy  at  q.  A 

i,j 

policy  function  will  be  chosen  which  will  maximize  the  reward  earned  per 
unit  time  in  steady  state  operation. 

The  choice  of  policy  must  be  restricted  at  some  states.  We  define 
the  range  SL  Baling  RPjl  ,  =  {i.i.j:}  is  the  set  of  allowed  policies  at 
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TABLE  1  Range  of  Policy 


state  index  i 

i=1  ,  j>=0  ! 

J 

ta.-d 

1 <i<L  ,  j>=0  ! 

I 

i=L  ,  j>=0  ! 

{£.£} 

Note  that  in  line  i=1,  line  closing  is  disallowed  while  in  line  i=L, 
line  opening  is  disallowed.  This  limits  both  the  maximum  and  the 
minimum  number  of  lines  which  are  active  at  any  moment  in  time. 


2 . 2  Holding  Times 

The  average  time  spent  within  a  state  under  a  given  policy  is 
defined  to  be  the  hold  time  HT.  .  Under  the  assumptions  of  Poisson 

1 » j  t 

arrival  and  service,  hold  times  can  be  computed  from  the  process 
parameters  as  follows : 


TABLE  2  Hold  Time  for  a  e  RP 

*  t  J 


policy 

1  HTi,J, 

ofsr 

j 

!  1/(S.  .+  A  ) 

1  1  f  J 

1 

Qfsfl,  or 

1 

1 

i  0 

j 

for  1<=i<=L 

,  0<=J 
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2.3  Reward  Structure 


The  reward  structure  describes  the  expected  rewards  minus  the 
expected  costs  at  each  state  within  the  system.  The  pertinent 
parameters  are: 


LTC 

PSC 

REW 

LOC 

LCC 


line  cost  per  unit  time 
packet  storage  cost  per  unit  time 
reward  per  packet  for  transmission 
line  opening  cost 
line  closing  cost 


The  reward  R.  ,  at  state 
i.J.o 

expressed  in  the  following 


q.  ,  assuming  the  use  of  policy  a  e  RP 
1  t  J 

table : 


i.J 


is 


TABLE  3  Reward  Function  for  or  e  RP. 

1  t  J 


policy  I 

1 

=£  ! 

1 

» 

(REW-S  -i-LTC-j-PSC)-HT. 

1  >  J  1  l  J  >-C 

=2.  i 

i 

-LOC 

i 

=SL  1 

-LCC 

2.4  Transition  Matrix 

The  transition  probability  from  state  q,  .  to  state  q  under 

i»J  m,n 

policy  a  is  defined  by  the  function  P.  .  • 

i,j,m,n,<* 
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TABLE  4  Transition  Probability  for 


Ley  i 


=r  ! 


■2. 


state  index 
msi  and  n=j+1 
nsi  and  n=min{0, j-1 } 
otherwise 
m=i+1  and  n=  J 
otherwise 


•■2. 


!  m=i-1 ,  and  n= J 

!  otherwise 


!  P 

l _ _ 

i  P=  >./(  X+S.  ) 

I  1 1 J 

i  P=0 


!  P*1 

I 

I 


!  P=0 


!  P=1 

1  P=0 


.  i 


A  state  diagram  for  the  line  control  Markov  process  is  shown  in 
figure  2. 

3.  Seducing  S.tattt  Saacs  Cardinality 

The  transition  probability  matrix  and  reward  structure  describe  the 
infinite  Markov  process  for  the  line  switching  communication  network. 
The  model  will  be  reduced  to  a  finite  Markov  process  by  representing 
occupancy  of  an  infinite  sub-set  of  states  within  the  process  by  entry 
into  a  single  aggregate  state  with  appropriate  mean  cost  and  hold  time. 
Using  this  approach,  we  construct  an  imbedded  finite  Markov  process 
which  describes  the  interesting  portion  of  the  policy  solution  domain. 

Let  us  partition  the  set  of  states  into  two  regions  called  the 
Inner  region.  {q,  ,  1 1 <=i<=L, J <K }  and  the  outer  region. 

1  *  J 

{ q.  1<si<sL, J>=K} .  The  parameter  K  is  a  positive  integer  constant 
1  *  J 

chosen  to  designate  the  beginning  of  the  regular  portion  of  the  optimal 
policy  solution.  In  the  outer  region,  we  will  show  that  the  packet 
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buffer  is  so  heavily  loaded  that  any  optimal  policy  opens  all  lines  to 
reduce  future  expected  coats  for  maintaining  packets  in  the  queue.  This 
is  called  the  all  lines  open  policy. 

All  Lines  Open  Policy 

j  ,  J>=K.  ! 

i  I 

1  POL^s^  ,  1<=i<L  ,  J>=K  ! 

The  outer  region  is  shown  in  figure  3;  note  the  use  of  the  all  lines 
open  policy  for  all  j>sK.  In  the  outer  region,  where  the  queue  is 
sufficiently  deep  (j>=K)  the  effective  service  rates,  branch 
probabilities,  and  hold  times  are: 

s^i  •  y»  p^s  XAX+s^  ht^l/C  X+s^  for  1<=i<=L 

Lower  case  variables  have  been  chosen  here  to  indicate  attributes  of  the 
outer  region.  The  constant  K  must  be  chosen  with  K>=L  to  insure  that 
that  there  are  enough  packets  in  the  queue  to  keep  all  L  servers  busy. 

The  uppermost  row  (i=L)  of  the  outer  region  describes  the  basic 
queueing  process  which  operates  when  the  packet  queue  is  heavily  loaded 
and  the  all  lines  open  policy  is  employed.  The  set  of  states 
{q^  j  !  j>=K}  forms  an  infinite  Markov  chain  where  the  only  exit  is  the 
transition  from  9^  K  to  a  member  outside  the  set,  q^  R_1 .  Then,  for  all 
J>3&,  P.  is  the  probability  that  a  state  q.  ,  makes  the  transition  to 

“  u,  J 

qL  J+1  while»  1-Pfc,  ia  th®  probability  of  making  the  transition  from  q^  j 
to  q^  .  In  all  future  discussions,  it  will  be  assumed  that  L  *X>u 
or  equivalently,  p^<l/2.  When  p^>sl/2,  the  Markov  process  is  transient, 
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and  the  packet  queue  will  grow  without  bound. 

Define  the  earned  reward  function  e(n),  n>=0  to  be  the  average 
cumulative  reward  earned  after  entry  to  state  q^  g+n,  but  before 
departure  from  the  uppermost  row  of  the  outer  region,  {q^  by 

branching  to  q^  R_1 .  We  can  write  the  following  balance  equation  by 
computing  the  earned  reward  e(n)  at  state  q^  R+n  in  terms  the  earned 
reward  from  its  neighbors: 


e(n)  =  VK+n,**  (1-PL),en-1  *  Wl  f0r  n>=° 


(1) 


e(-1 )  =  0 


The  initial  condition  for  the  balance  equation,  e(-1 )=0  implies  that  no 
further  reward  is  accumulated  after  the  transition  to  q^  ^  which 
coincides  with  departure  from  the  outer  region. 

The  earned  reward  function  can  be  readily  evaluated  by  exploiting 

the  symmetry  of  the  uppermost  row  (i=L)  of  the  outer  region.  Let  H 

represent  the  expected  time  before  the  sub-chain,  Qgs{q^  j  '  *3 

exited  assuming  that  the  process  starts  in  q^  g.  Let  E  represent  the 

expected  reward  accumulated  due  to  additional  packet  arrivals  over  the 

same  interval.  Then,  the  earned  reward  recurrence,  e(n)  expresses  the 

fact  that  for  any  n>=0,  the  sub-chain  Q_  ={q.  .  !  j>=K}  has  the  same 

iv+u  j+n 

structure  as  the  sub-chain  Qg  except  that  an  additional  n  entries  must 
be  maintained  in  the  queue  as  long  as  the  Markov  process  remains  within 
St+n*  hold  time  function .  h(n)  computes  the  time  to  exit  Qg  from 

any  state  q^  g+n,  hence  h(0)aH.  These  two  recurrences  are  shown  below: 
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e(n)=[a-(K+n)+b]-H  +  E  +  e  , 

n-i 

and  ( 2 ) 

h(n)=H  +  h  ,  n>=0 

n- 1 

Where,  the  initial  conditions  are:  e(-1)=0  and  h(-1)=0. 

The  parameters  a  and  b  are  defined: 

a=-PSC  and  b=(REW-s^  -  L*LTC)  so  that, 

Rl  j  s  (a-j  +  b)-htL  for  j>=K. 

Note  that  a  and  b  above  are  the  linear  term  in  queue  length  j  and 
constant  term  for  the  reward  earning  rate  at  states  in  the  outer  region. 
If  the  system  is  started  in  state  q^  the  reward  recurrence  implies 

that  K+n  packets  must  be  maintained  for  a  time  H  equal  to  the  time  the 
system  remains  within  the  subset  of  states  {qL  n+j!j>=K},  E  reflects 
additional  costs  from  queue  entries  accumulated  because  of  transitions 
to  the  right,  and  ®n_i  represents  costs  accumulated  after  entering 
qL  K+n  1  *  Symmetry  and  the  Markov  property  dictate  that  H  and  E  are 
well  defined  and  independent  of  index  n.  The  solution  to  these  two 
recurrences  is  shown  below: 

e(n)  =  a  •[ (n(n+1 )/2 )  •  H]  +  (n+1)-E  +  (a-K+b)-h(n) 

and  ( 3 ) 

h(n)  =  (n+1)-H  for  all  n>=0 

The  constants  H  and  E  can  be  determined  by  substituting  the  solutions  to 
the  earned  reward  recurrence ,  ( 3 )  into  the  earned  reward  balance 

equation,  (1)  and  equating  coefficients  of  n: 


13 


H  a  htL/(i_2-pL)  and  E  a  ht^p^a/C  1-2- PL)2. 

Using  these  values  for  H  and  E,  the  solutions  for  e(n)  and  h(n)  can  be 
shown  to  satisfy  both  the  balance  equation  of  (1)  and  the  recurrences 
of  (2). 

The  solutions  to  the  earned  reward  recurrence  allow  us  to  express 
the  infinite  line  control  Markov  process  by  a  finite  one  where  in  the 

uppermost  row,  an  infinite  set  of  states  {q.  .  !j>=K}  is  replaced  by  a 

J 

single  aggregate  state  q'^  K  with  appropriate  mean  cost  e(0)=E+(K*a+b)  H 
and  mean  hold  time  h(0)=H.  The  aggregate  state  q',  _  branches  to  q.  . 
with  probability  one  thereby  truncating  the  state  space.  The  terminal 
states  of  lesser  line  index  q^  g,  1<=i<L  will  be  constrained  to  employ 
policy  a,  a  choice  which  will  be  shown  to  be  optimal.  This  finite  line 
control  process  is  shown  in  figure  4.  When  K  is  chosen  sufficiently 
large,  the  finite  Markov  process  terminating  at  q'  1<=i<=L  describes 

1 ,  K. 

an  imbedded  chain  within  the  Markov  chain  formed  by  any  optimal  policy 
for  the  original  infinite  process  under  specific  conditions  which  will 
be  discussed  later. 


4.  Relative  Values  under  the  ^3,1  Lines  Open  Policy 

In  [3],  Howard  describes  a  procedure  for  the  optimization  of  Markov 
decision  processes.  The  policy  optimization  procedure  involves  an 
analysis  step  where,  under  a  given  policy,  relative  value  equations  are 
evaluated  by  solving  a  system  of  linear  equations,  and  a  policy 
improvement  step  where  relative  values  from  the  previous  policy  are  used 
to  determine  a  new  policy  yielding  higher  average  reward  per  unit  time. 
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When  the  average  reward  per  unit  time  (the  gain  G)  of  the  system  is 
identical  in  two  successive  polioy  iteration  steps,  the  iteration  has 
converged  and  the  associated  policy  is  known  to  be  optimal. 

In  this  section,  we  will  use  the  earned  reward  recurrence  to 
construct  the  relative  values  in  the  outer  region.  The  relative  value 
equations  for  the  line  control  Markov  decision  process  appear  as 
follows : 


define  orsPOI^  j  then, 


m,n 


for  1  <=i<=L  ,  0<=  j  ,  1<=m<=L  ,  0<=n 


In  equation  (4)  above,  V,  .  is  the  relative  value  for  state  q.  .,  and  G 

i  »  J  1  »  J 

represents  the  process  gain  or  average  reward  earned  per  unit  time. 
Once  the  relative  values  have  been  determined  under  a  given  policy,  a 
policy  enhancement  step  is  performed  by  maximizing  the  value  oriented 
test  quantity: 


MAX 

aeRP 


{Ri,j,a  “  G  HTi,j,«  +  2^  ^Pi, J ,m,n,or^  Vm,n  } 


i.J 


m,n 


for  1<ai<sL  ,  0<a J  ,  KsnKsL  ,  0<=n 


(5) 


This  yields  a  policy  with  higher  gain  at  each  iteration  until  an  optimal 
policy  is  reached. 
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The  solutions  of  (3)  for  the  earned  reward  and  hold  time 
recurrences  may  be  used  to  compute  relative  values  for  all  states  within 
the  uppermost  row  of  the  outer  region,  {q^  j  !  j>=K}.  A  multistep 
relative  value  equation  shown  below  will  be  used  to  compute  the  value  of 
each  of  these  states  in  terms  of  the  value  for  state  QL,K-1  vrtlich 
be  given  the  arbitrary  value  V.  This  equation  determines  <?LfK+n'3  value 
V  „  directly  in  terms  of  V  and  the  average  cost  and  time  required  to 

L  |  a+ll 

reach  q^  K_1  from  any  initial  state  <JLjK4.n»  n>=0* 

Vt  „  3  -G-h(n)  +  e(n)  +  V  (6) 

L ,  K+n 

for  0<=n  ,  K  fixed  and  sufficiently  large 


Here,  h(n)  represents  the  average  duration  of  stay  within  the  outer 
region  assuming  the  system  starts  in  q^  R+n,  e(n)  accounts  for  the  cost 
paid  while  within  this  region,  and  V  represents  the  relative  value  of 
state  q.  ,  which  is  reached  after  departing  the  outer  region. 

L ,  JV— i 

The  relative  values  for  states  q.  .  of  lesser  line  index  (i<L) 

1  t  J 

under  the  assumption  that  POL.  .=2,  ,  1<=i<L  ,  J>sl<  can  now  be  computed 

*  *  J 

by  repeatedly  using  equation  (4)  for  lines  L-1 ,  L-2,  ...  >1: 


V 


i,K+n 


V.  „  -  (L-i)'LOC  for  1<=i<=L 

L ,  K+n 

-G-h(n)  +  e(n)  ♦  V  -  (L-i)-LOC 


0<=n 


(T) 


The  relative  values  of  (7)  can  be  shown  to  satisfy  the  relative  value 
equations  of  (4)  where  the  all  lines  open  policy  is  employed  in  the 
outer  region.  Equation  (4)  has  been  modified  to  employ  the  all  lines 
open  policy  over  the  outer  region  yielding  the  equations  below: 


1 
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VL,j  8  RL,j,r  -  G'htL  +  *  (pL),VL,j*1’  for  *>aK 

and  (8) 

Vi,j  5  +  Vi+1,J  for  1<si<L»  J>sK* 


Since  the  relative  values  satisfy  (8),  they  oust  be  the  correct  relative 
values  of  states  within  the  outer  region  for  the  line  control  Markov 
process  under  the  all  lines  open  policy. 


5. Optimal  Policy  Sha.  Outer  fagjfln 

The  motivation  for  the  work  above  is  based  on  the  assumption  that 
the  all  lines  open  policy  is  optimal  for  states  within  the  outer  region. 
Now  that  solutions  for  the  relative  values  have  been  established  for 
states  within  the  outer  region  under  the  all  lines  open  policy,  we  shall 
determine  whether  these  values  indicate  that  the  all  lines  open  policy 
is  indeed  optimal .  The  conditions  under  which  the  optimality  criterion 
is  satisfied  will  now  found  by  substituting  the  relative  values  from  (7) 
into  the  test  quantity  of  (5).  The  all  lines  open  policy  is  optimal  in 
the  outer  region  exactly  when  (5)  is  maximized  by  selecting  POLISH  and 
POL.  ,s jj>,  1 <s i<L ,  J>=K . 

Let  us  first  consider  the  set  of  states  within  the  uppermost  row, 
It  may  be  quickly  shown  that  in  this  uppermost  row  of  the 
outer  region,  the  policy  POL^  yields  a  test  quantity  which  is  at 
least  as  high  as  that  of  POL^  ,=£  whenever  LOC  and  LCC  are  non-negative. 
The  policy  £  is  not  feasible  and  hence,  not  a  candidate  for  maximizing 
(5).  Thus,  the  policy  £  maximizes  (5)  in  the  uppermost  row  whenever 
line  switching  costs  are  non-negative. 
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For  states  of  lesser  line  index,  we  will  first  show  that  the  policy 

POL,  . so  is  strictly  better  than  the  policy  POL,  ,=£.  Assume  that  the 
i ,  J  1 » J 

test  value  of  (5)  foror=2.  is  greater  than  that  for  cr=£,  1<=i<L,  and  the 
following  inequalities  result: 

(9) 


-Ri,j,a-G-HTi,j,a  +  vi+i,j  >  Ri, j "  G'HTi, j,n 

+  (pi,j,i,j-1 ,r)‘ Vi,j-1  +  (Pi,j,i,J+1 ,z)‘  Vi,j+1 


for  1 <= i <L  ,  j>=K 


or 


>  -Vi+i,j  *  L0C  ♦  (a-  j+b)  •hti  -  G.hti+(1-pi).ViiJ_1  +  (Pi)-ViJ+1 


After  substituting  the  relative  values  for  the  outer  region  into  (9), 
all  second  order  terms  in  j  cancel.  If  PSC  is  assumed  to  be  positive 
and  non-zero,  the  linear  coefficient  of  j  is  negative  and  once  the  test 
inequality  is  satisfied,  similar  test  quantities  with  higher  index  j 
must  also  satisfy  this  inequality.  Selecting  linear  terms  in  j  and 
simplifying ,  we  have : 

a*pi  <  a-(H-hi)/(2’H)  ,  1 <=i<L 

but,  hL  <  hj 

therefore,  a^  <  a-(H-hL)/(2*H)  s  a-p^ 
or, 

When  a=-PSC<0,  we  have:  p^  >  p^  * 


It  can  also  be  shown  that  POL,  ,a£  yields  higher  test  quantities  than 

1  »  J 

POL^  j=£.,j>sK,  1<i<L.  Thus,  if  p^  >  p^,  1<=i<L,  then  there  exists  an 
index  K  such  that  the  optimal  policy  for  all  qt  R+n,  1<=i<L,  n>=0  is 
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=&•  This  follows  because  linear  terms  in  j  within  equation  (9)  must 
eventually  dominate  constant  terms  for  J  sufficiently  large.  We 
conclude  that  whenever  the  effective  service  rate  increases  with  number 
of  lines,  the  all  lines  open  policy  will  be  optimal  for  some 
sufficiently  large  index  K  denoting  the  beginning  of  the  outer  region. 

6.  Policy  Solution  over  the  Inner  Region 

Once  the  infinite  outer  region  of  the  state  space  has  been  replaced 
by  the  column  of  states,  q'^  ^  the  solution  of  the  line  control 
optimization  problem  becomes  straightforward.  Policy  iteration 
techniques  may  be  used  to  determine  optimal  policies  for  line 
management .  The  policy  iteration  cycle  as  described  by  Howard  consists 
of  first  solving  the  system  of  equations  of  (4)  along  with  a  ground 
state  definition  equation  (we  chose  V1  Q=0 )  for  the  relative  values  and 
the  gain  G,  then  a  higher  gain  policy  is  determined  using  the  previous 
relative  values  and  the  test  quantity  of  (5).  The  iteration  halts  when 
the  same  policy  is  selected  twice. 


Some  difficulties  may  be  encountered  because  certain  policies  will 
yield  a  Markov  process  with  multiple  recurrent  chains  complicating  the 
algebraic  solution  of  the  relative  value  equations  which  are  potentially 
singular.  This  difficulty  may  be  avoided  either  by  using  the  policy 
iteration  technique  modified  to  treat  polydesmic  processes  [3],  or  by 
carefully  choosing  an  initial  policy  (e.g.:  all  lines  open  over  the 
inner  region)  which  defines  exactly  one  recurrent  chain  within  the 
uppermost  row.  Since,  any  state  may  experience  an  arbitrary  large 
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number  of  arrivals  in  a  small  interval  of  time,  all  states  have  a 

non-zero  probability  of  reaching  some  state  q.  with  arbitrarily  high 

*  t  J 

queue  length  index  j.  Thus,  if  the  all  lines  open  policy  is  employed 
for  all  j>=K,  all  states  can  reach  a  single  recurrent  chain  containing 
UL  j!j>=K}.  This  is  a  sufficient  condition  to  insure  that  a  policy 

forms  a  monodesmic  Markov  process.  Not  only  does  the  initial  policy 

form  a  monodesmic  process,  so  do  all  successive  policies  searched  from 
this  initial  policy  in  the  iteration  cycle. 

The  choice  of  K  is  sufficiently  large  whenever  the  optimal  policy 
chosen  for  q^  ,  1<=i<=L  is  identical  to  that  chosen  under  the  all 

lines  open  policy.  Thus,  whenever  the  finite  policy  iteration  halts 
with  an  optimal  policy  where,  POL.  =£,  and  POL.  ..  .  =£,  1<=i<L,  this 

L  |  1  1  f  &*•*  1 

policy  is  optimal  for  the  infinite  process  where  of  course  the  all  lines 
open  policy  is  employed  in  the  outer  region.  This  must  be  true  because 
if  (9)  is  satisfied  for  j=K-1,  it  will  also  be  satisfied  for  all  j>sK. 
When  larger  than  necessary  values  for  K  are  chosen,  states  which  were 
within  the  outer  region  for  smaller  choices  of  K  now  lie  within  the 
inner  region;  however,  the  resulting  policy  and  relative  values  for 
states  within  the  smaller  inner  region  will  match  those  of  corresponding 
states  within  the  larger  inner  region.  Hence,  any  choice  of  K  which  is 
too  large  will  result  in  a  correct  solution. 

7.  Sxaapig, 

Figure  5  illustrates  an  application  of  this  algorithm.  The  process 
parameters  used  in  this  example  are  shown  in  the  table  below: 
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TABLE  5  Process  parameters  for  example  of  figure  5 


L 

=  5 

LTC 

= 

3. 

K 

=  9 

PSC 

- 

3. 

\ 

=  2. 

REW 

= 

1. 

u. 

=  1. 

LOC 

s 

2. 

LCC 

S 

1. 

The  optimal  policy  resulting  from  use  of  the  policy  iteration  algorithm 
shown  within  each  state.  Only  transition  arcs  resulting  from  the 
optimal  policy  are  shown.  The  parameter  K=9  denotes  the  beginning  of 
the  outer  region.  Initially,  K  was  chosen  larger  with  identical 
results;  however,  K=9  was  selected  as  the  minimum  value  of  K  which 
still  illustrates  the  optimality  of  the  solution  (all  lines  open  for 
j=K-1).  The  policy  iteration  algorithm  converged  to  the  optimal  policy 
from  the  initial  all  lines  open  policy  in  7  iterations.  Little  is  known 
of  the  rate  of  convergence  of  the  policy  iteration  algorithm.  While  the 
number  of  possible  policies  is  exponential  in  the  number  of  states,  all 
example  problems  converged  very  quickly  to  an  optimal  policy. 

The  relative  values  and  gain  resulting  from  the  policy  iteration 
algorithm  are  shown  in  figure  6.  Note  that  for  this  example  the  gain  is 
negative  indicating  that  the  network  runs  at  a  deficit.  That  is, 
rewards  for  packet  transmission  cannot  pay  for  costs  to  mantain  lines, 
switch  lines  and  store  packets;  this  results  from  a  low  choice  of  the 
value  for  REW.  The  relative  values  indicate  the  relative  merit  of 
residing  within  specific  network  states.  The  state  q.  n  was  chosen  as 
the  ground  state  and  therefore  has  value  zero.  All  other  relative 
values  happen  to  be  more  negative  than  q1  Q  and  are  therefore  more 
costly  as  initial  states  from  which  to  resume  message  transmission. 
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Queue  Length 

Lines 

Open 

0 

H 

2 

3 

4 

5 

6 

7 

8 

5 

-4.00 

-8.22 

-17.14 

-21.65 

-26.17 

-31.68 

-38.20 

-45.72 

4 

-3.00 

-7.22 

-16.14 

-20.65 

-26.39 

-33.08 

-40.20 

-47.72 

3 

-2.00 

-6.22 

-10.63 

-15.25 

-21.41 

-28.39 

-35 . 08 

-42.20 

-49.72 

2 

-1.00 

-5.22 

-10.06 

-16.63 

-23.41 

-30.39 

-37.08 

-44.20 

-51.72 

1 

0.00 

-5.22 

-12.06 

-18.63 

-25.41 

-32.39 

-39.08 

-46.20 

-53.72 

Gain  ■  -13.45 


Figure  6 
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Note  that  the  states  relative  value  grows  more  negative  with  increasing 
queue  depth  as  the  costs  for  packet  storage  are  certain  to  be  higher. 

8.  gflaaluaiaa 

The  work  above  describes  a  model  of  line  opening  and  closing 
operation  within  a  computer  communication  network  and  illustrates  the 
solution  of  optimal  policies  for  line  switching.  The  optimization  is 
carried  out  under  the  assumption  of  exponential  packet  arrival  and 
service.  The  assumption  that  packet  service  is  exponential  on  each  line 
is  fairly  realistic  since,  in  a  global  sense  it  merely  states  that  the 
packet  service  rate  on  each  line  is  independent  of  the  state  (queue 
length  and  number  of  lines  open).  However,  the  assumption  of  an 
exponential  arrival  rate  of  messages  is  far  more  questionable  since  the 
intent  of  the  control  algorithm  is  to  dynamically  vary  the  number  of 
lines  in  an  environment  of  changing  traffic  load. 

The  algorithm  which  has  been  developed  for  an  exponential  arrival 
process  could  be  directly  applied  to  a  non-exponential  arrival  process 
producing  a  reasonably  good  heuristic  algorithm.  However,  there  are  a 
number  of  extensions  to  other  more  sophisticated  heuristic  approaches. 
If  we  assume  that  the  the  arrival  process  is  approximately  exponential, 
but  the  arrival  rate  is  slowly  varying  in  time,  a  superior  approach 
would  be  to  estimate  the  current  arrival  rate,  and  select  a  strategy 
compatible  with  the  current  arrival  rate  estimate. 

The  first  part  of  this  procedure  is  to  determine  the  optimal  policy 
over  a  wide  range  of  arrival  rates.  This  is  done  by  statically 
resolving  the  problem  for  exponential  arrivals  over  a  wide  range  of 


arrival  rate  paraaeter  X  and  determining  specific  ranges  for  X  where  a 
particular  policy  is  optimal  or  near  optimal  assuming  an  exponential 
process  at  the  given  rate.  From  this,  the  continum  of  the  arrival  rate 
parameter  can  be  broken  into  ranges  where  a  specific  policy  is 
preferred.  This  entire  operation  is  performed  statically  at  design  time 
and  is  thus  computationally  feasible. 

The  second  part  of  the  procedure  is  to  dynamically  approximate  the 
instantaneous  arrival  rate  of  the  running  network  in  order  to  select  the 
most  suitable  policy  from  the  tables  produced  above.  The  specific 
approach  here  depends  strongly  on  the  nature  of  the  non-exponential 
arrival  process,  but  a  simple  strategy  will  be  described.  If  the  source 
node  breaks  up  time  into  fixed  sized  intervals  and  measures  the  number 
of  arrivals  within  each  interval,  this  sequence  of  interval  measures  can 
be  used  as  a  statistic  from  which  one  can  derive  an  approximate  arrival 
rate.  For  example,  the  rate  estimator  could,  at  each  iteration,  compute 
a  current  rate  estimate  as  a  weighted  average  of  the  old  rate  estimate 
and  the  current  interval  measure.  This  would  geometrically  decrease  the 
significance  of  old  interval  measures  and  allow  the  construction  of  a 
simple  rate  estimator  requiring  very  little  computer  time  and  memory 
space.  The  rate  estimate  could  then  be  used  to  select  the  appropriate 
policy  decision  table  over  next  time  interval.  More  exotic  schemes 
could  be  discussed  but  mean  very  little  without  a  more  careful 
characterization  of  the  true  nature  of  the  arrival  process. 


26 


REFERENCES 


[1]  L.  Kleinrock,  Communication  Nets:  Stochastic  jjfiaiagg.  £1 2iL 
and  Delay.  McGraw-Hill,  New  York,  1964.  Reprinted,  Dover 
publications,  1972. 

[2]  M.  Schwartz,  Computer-Communication  Ngt^orK  P.salffll  30l 
Analysis.  Prentice-Hall,  Inc.,  1977. 

[3  3  R.  A.  Howard,  Dynamic  Probabilistic  S  vs  t  etna  iala*I  4I£ 
Semi-Markov  ml  Decision  Processes.  John  Wiley  and  Sons, 
Inc., .1971 . 


